Add dplyr tutorial port, clean up docs so documenter is happy #279

pdeffebach · 2021-08-01T18:57:44Z

No description provided.

docs/src/dplyr.md

bkamins · 2021-08-02T06:46:54Z

docs/src/dplyr.md

+
+## What is DataFramesMeta.jl?
+
+DataFramesMeta.jl is a Julia package to transform and summarize tabular data. It provides a more convenient syntax to work with DataFrames from [DataFrames.jl](https://github.com/JuliaData/DataFrames.jl). For a deeper explanation of DataFramesMeta.jl, see the [documentation](https://github.com/JuliaData/DataFramesMeta.jl). 


maybe add that this is a DSL. The syntax is more convenient at the cost of syntax not being valid Julia code.

On the other hand DataFramesMeta.jl concepts try to mirror DataFrames.jl concepts (which is important I think for learning and using both)

Should be clearer now.

docs/src/dplyr.md

bkamins · 2021-08-02T06:49:23Z

docs/src/dplyr.md

+
+Like dplyr, the DataFramesMeta.jl package contains a set of macros (or "verbs") that perform common data manipulation operations such as filtering for rows, selecting specific columns, re-ordering rows, adding new columns and summarizing data. 
+
+In addition, DataFramesMeta.jl contains a useful operation `@combine` to perform another common task which is the "split-apply-combine" concept. We will discuss that in a little bit. 


this sentence is not clear to me and seems more detailed than the previous. Especially as in the previous you have written "summarizing data".
Also - if you keep this maybe give a link to "split-apply-combine" so people reading it know what we mean (not all of them might know it)

Hopefully it is clearer now.

docs/src/dplyr.md

bkamins · 2021-08-02T06:56:08Z

docs/src/dplyr.md

+
+# Important DataFramesMeta.jl Verbs To Remember
+
+dplyr verbs | Description


why do you call them dplyr verbs?

The base tutorial this came from uses the term "verb". I think the author likes the term because it sounds less technical than "function".

I am OK with verb, I am not clear why you use term "dplyr" - it seems these DataFramesMeta.jl verbs.

Oh that was a typo, sorry.

docs/src/dplyr.md

bkamins · 2021-08-02T06:58:11Z

docs/src/dplyr.md

+`@combine` | summarise values
+`groupby` | allows for group operations in the "split-apply-combine" concept
+
+DataFramesMeta.jl also provides `@rselect`, `@rsubset`, `@rorderby`, and `@rtransform` for operations which act row-wise. We will expore the distinction between column-wise and row-wise transformations later in this turorial. 


maybe use term "whole-column" rather than "column-wise"? Alan Edelman was confused by "col-wise" (as it seems that one operation works vertically and the other horizontally which is not the case)

good point. Hopefully the language is clearer.

bkamins · 2021-08-02T06:59:18Z

docs/src/dplyr.md

+sleepData = @select msleep :name :sleep_total
+```
+
+To select all the columns *except* a specific column, use the `Not` function for inverse selection. We preface the `Not` with `$` because it does not reference a column directly as a `Symbol`.


the explanation of $ is not clear. The reader is not clear what would happen if you skipped $.

Fixing this. But we should merge a PR special-casing Not, Between, Regex, and r"..." so we don't have to worry about this.

docs/src/dplyr.md

bkamins · 2021-08-02T07:10:11Z

log.txt

@@ -0,0 +1,11 @@
+Doctests: DataFramesMeta: Test Failed at /home/peterwd/.julia/packages/Documenter/oBZFM/src/Documenter.jl:870


I would not put this log file in the PR

src/DataFramesMeta.jl

pdeffebach · 2021-08-04T02:00:44Z

Thanks for the review! Should be much improved now.

…Meta.jl into dplyr_port

docs/src/dplyr.md

bkamins · 2021-08-04T13:09:39Z

docs/src/dplyr.md

+@select msleep $varnames
+```
+
+Similarly, to select the first column, use the syntax `$1`. 


is $ required here?

yes.

Right now, the parsing for selecting columns is exactly the same as working with anonymous functions. So since @transform df :y = :x .+ 1 would be ambiguous if we allowed 1 to be a column selector in the anonymous function, we need the same thing when doing select.

Not ideal, though. We can change this before 1.0.

I guess it is :).

I think $1 makes sense - it just probably should be well explained somewhere.

docs/src/dplyr.md

Fix bug disallowing `@rsubset(df, :a, :b, :c)`

pdeffebach · 2021-08-04T14:05:29Z

Thanks!

pdeffebach added 3 commits August 1, 2021 07:24

initial commit

1caa748

pre_fix changes

6cc9d9b

lots of fixes

2dbb295